In this tutorial we show how to use our python package (discussiontree) to analyse and visualize the discussions hierarchy structure of online forums. Our package was developed for the analysis of online discussions on Cubadebate platform (Wordpress Site http://www.cubadebate.cu/), but it is easily adaptable to other platforms. As a main limitation, our package requires as input the Forum discussion in json format. Consequently, a previous extraction of this information is required with some other extraction tool or Discussion API (Example: Diffbot API https://www.diffbot.com/). In general, the sequence of steps that must be carried out can be summarized as:
Step 0: JSON Extraction using Diffbot (https://www.diffbot.com/)
Step 1: Read JSON data and generate the posts_dataset (.CSV format)
Step 2: Cleanning the posts dataset (Manual operation)
Step 3: Information extraction
Step 4: Discussion Tree Generation
Step 5: Discussion Tree Visualization
In this step we use an external tool to extract the online discussion information in json format. In this particular tutorial, we use the diffbot tool.
# Importing our Owned Libs (Main functionalities of our tool)
from discussiontree.processing import processing_json, information_extraction
from discussiontree.utils import decode_json, discussion_outdir, tree_data
from discussiontree.DTree import DiscussionTree
from discussiontree.utils2plot import tree_plotting
# Global Setting
# .json directory
in_dir = 'DISCUSSIONS_CLEAN'
# output directory
out_dir = 'DISCUSSIONS_OUT'
# json important fields
source_data = ['date','author','id','parentId','diffbotUri','humanLanguage','text']
# Discussion setting
# json filename
filename = 'Discussion29.json'
# Discussion output_dir
output_dir = discussion_outdir(out_dir,filename)
# Step 1: Read JSON data and generate the posts_dataset (.CSV format)
# processing .json
processing_json(in_dir, output_dir, filename, source_data)
In this step, it must be verified that the online discussion information was correctly extracted. In case any error is detected, it must be manualty fixed directly in the comments dataset (in CSV format).
In this step, the necessary information extraction is carried out for the posterior construction of the discussion trees. Information extracted: unique Authors_List, Interactions_by_AuthorID, Author_Participation, Interactions_by_levels and, if specified (tree_generation=True), the discussion trees.
# Step 3: Information extraction
filename = 'Discussion7.json'
output_dir = discussion_outdir(out_dir,filename)
information_extraction(output_dir,filename, verbose = False, tree_generation=False)
# Step 4: Discussion Tree Generation
filename = 'Discussion12.json'
output_dir = discussion_outdir(out_dir,filename)
# Data Generation
forum_interactions = tree_data(output_dir, filename)
#print(forum_interactions)
# Discussion Tree Construction
dtree = DiscussionTree()
# Filling the Discussion Tree
dtree.set_tree(forum_interactions)
# Step 5: Discussion Tree Visualization
# parameters
node_radius = 0.6 # default 0.5
space_between_levels = 10 # default 5
# legend_pos = 'Top', default
# title=None
# plot_legend = True
tree_plotting(dtree,filename, output_dir,space_between_levels, node_radius,legend_pos='Button')
Given a discussion and a specific user post, our tool is able to reconstruct the discussion thread generated by that user's post.
from discussiontree.processing import discussion_thread
# parameters
id_post = 12 # specific user post
plot_legend= False # flag to plot the legend
legend_pos = 'Top' # legend position in the graph
# Discussion Thread Generation
discussion_thread(output_dir,filename,forum_interactions,id_post,
plot_legend=plot_legend)
This functionality processes all JSON discussion files (step 1 for each JSON discussion) in just one step.
from discussiontree.processing import processing_all_json
# SETTING
# Global Setting
# .json directory
in_dir = 'DISCUSSIONS_CLEAN'
# output directory
out_dir = 'DISCUSSIONS_OUT2'
# json important fields
source_data = ['date','author','id','parentId','diffbotUri','humanLanguage','text']
# Discussion setting
filename_prefix = 'Discussion'
discussion_number= 28
# COMPUTTING
processing_all_json(in_dir, out_dir, filename_prefix, discussion_number,source_data)
This step can only be executed after cleaning and manual verification of the information extracted from each discussion json file (Step 0). This step processes, extraction and visualization of all the comments dataset of the discussions (steps 2, 3, 4 and 5 for each discussion) in just one step.
IMPORTANT: if an error is generated for a specific discussion, repeat each of the steps explained earlier in this tutorial (steps 0 to 5) for that specific discussion.
from discussiontree.utils import decode_json, discussion_outdir, tree_data
from discussiontree.processing import processing_json, all_information_extraction information_extraction
# output directory of cleaning data
# Setting
out_dir = 'DISCUSSIONS_OUT2'
filename_prefix = 'Discussion'
discussion_number= 28
# Processing all Information of each Discussion dataset.
all_information_extraction(out_dir, filename_prefix, discussion_number, tree_generation=True)
# =================================================== #
# Package Description
# ---------------------------------------------------
# Copyright: Copyright 2019, DiscussionTree
# Version: 1.0
# License: GNU GPLv3
# Status: dev
# Author: {Omar Vidal}
# Email: ovidalp83@gmail.com
# Credits: [Omar Vidal, Elisa B. Ramirez]
# Maintainer: Omar Vidal
# =================================================== #